Privacy preserving data mining: a signal processing perspective and a simple data perturbation protocol

نویسنده

  • Chai Wah Wu
چکیده

Privacy concerns over the proliferation of gathering of personal information by various institutions over the internet led to the development of data mining algorithms that preserve the privacy of those whose personal data are collected and analyzed. A novel approach to such privacy preserving data mining algorithms was proposed where the individual datum in a data set is perturbed by adding a random value from a known distribution. In these applications, the distribution of the original data set is important and estimating it is one of the goals of the data mining algorithm. This distribution is estimated via an iterative algorithm such as the Expectation Maximization (EM) algorithm which was shown to have desirable properties such as low privacy loss and high fidelity estimates of the distribution. Each iteration of EM requires computation that is proportional to the size of the data set and can require large computation time to estimate the distribution. In this paper we propose two ways to reduce the amount of computation. First, we show that the problem can be recast as a deconvolution problem and signal processing algorithms can be applied to solve this problem. In particular we consider both a direct method and iterative methods which are more robust against noise and ill-conditioning. We show that the Richardson-Lucy deblurring algorithm is equivalent to EM after quantization. The signal processing approach also shows how the choice of perturbation affects information loss and privacy loss and allows us to clarify some points made in the literature. In the second part of this paper, we propose a scheme for perturbing data which also has the nice properties of arbitrarily small privacy loss and arbitrarily high fidelity in the estimate. The main advantage of the proposed scheme is the simplicity of the estimation algorithm. In contrast to iterative algorithms such as EM, the proposed scheme estimates the unknown distribution in one step. This is significant in applications where the data set is very large or when the data mining algorithm is run in an online environment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy-Preserving Collaborative Association Rule Mining

In recent times, the development of privacy technologies has promoted the speed of research on privacy-preserving collaborative data mining. People borrowed the ideas of secure multi-party computation and developed secure multi-party protocols to deal with privacy-preserving collaborative data mining problems. Random perturbation was also identified to be an efficient estimation technique to so...

متن کامل

On Random Additive Perturbation for Privacy Preserving Data Mining

Title of Thesis: On Random Additive Perturbation for Privacy Preserving Data Mining Author: Souptik Datta, Master of Science, 2004 Thesis directed by: Dr. Hillol Kargupta, Associate Professor Department of Computer Science and Electrical Engineering Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacy-preserving...

متن کامل

A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining

The major challenge of data perturbation is to achieve the desired balance between the level of privacy guarantee and the level of data utility. Data privacy and data utility are commonly considered as a pair of conflicting requirements in privacy-preserving data mining systems and applications. Multiplicative perturbation algorithms aim at improving data privacy while maintaining the desired l...

متن کامل

The applicability of the perturbation based privacy preserving data mining for real-world data

The perturbation method has been extensively studied for privacy preserving data mining. In this method, random noise from a known distribution is added to the privacy sensitive data before the data is sent to the data miner. Subsequently, the data miner reconstructs an approximation to the original data distribution from the perturbed data and uses the reconstructed distribution for data minin...

متن کامل

Privacy-preserving Clustering of Data Streams

As most previous studies on privacy-preserving data mining placed specific importance on the security of massive amounts of data from a static database, consequently data undergoing privacy-preservation often leads to a decline in the accuracy of mining results. Furthermore, following by the rapid advancement of Internet and telecommunication technology, subsequently data types have transformed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003